First we import and merge the data from three different sources:
Ward (Ward, n.d.a) from: https://osf.io/p5xsd/files/osfstorage
Rothen (Rothen et al. 2016)
Van Petersen (Van Petersen et al. 2020a)
Then we compute several features that could describe synesthete consistency in three main parts. (A) we replicate features found in the literature (i.e. Van Petersen et al. (2020b) ) . (B) We extract features based on the form. (C) We harness a geography package to compute segment based features (D) We compute polygon based features. (E) Convex Hull (F) Angles.
Each feature is presented with the following structure:
Compute Feature
Example
Receiver Operator Curve (ROC)
Finally we have a summary table presenting:
In addition some (not very successful IMO) machine learning to try to find out which feature combination could be best to diagnose space sequence synesthesia.
*Synesthesia* is the concomitant perception from two different senses, for example certain humans perceive numbers as having a well defined position in space. Synesthetes come in all colour and flavours such as …
Space Sequence Synesthesia (SSS) is a phenomenon present in some humans who perceive a spatial property for some stimuli. One of the first report of this phenomenon describes a particular spatial placement for numerals (Galton 1880).
A srtict definition of Synesthes requires this five different criteria: )
Automaticity: the inducer automatically triggers the concurrent. For example February might automatically trigger a specific location in the top left peri-space. Analogously to a colour word activates the colour in human literates (see stroop effect).
Unidirectionality: the inducer triggers the concurrent but the concurrent trigger the inducer. For example if February automatically triggers the top left pei-space, the top-left peri space does not trigger February.
Consciousness: The experience is conscious. For example a synesthete is conscious of his or her perception of February in the top left peri-space.
Development: Should be present early in development. For example seeing month in particular spatial location already occurred as a child.
Consistency: inducer-concurrent pair is stable in time. For example February is perceived on the top left, whether the time of the day or age. (Altough some changes might occur with aging).
Consistency is the most suited for experimental settings since it can be tested by repeatedly presenting specific inducers to participants and collect the responses for their concurrent. If comparatively similar responses are given for the same inducers, then syneshtesia could be detected. Those tests have become the golden standart to detect synesthesia successfully, for example colour-grapheme synesthesia using colour picker (Rothen et al. 2013). The transposition of this method to SSS have however not yelded convincing criteria (see Ward, Roten). Instead of colour picker, SSS are asked to position a set of inducer on their idiosyncratic concurrent location on the screen. If each inducer is repeated several times we can then compute the area between the responses for each inducer (i.e. a triangle if repeated three times). The sum or grand average of the triangle areas across several inducers of several conditions (i.e. number, weekdays and months) is then used to estimate individual consistencies. The smallest the total, the more consistent individual responses area. Despite yealding to satisfactory results, it leads to several limits: participant can give a response in the same position of the screen and obtain excellent consistency scores.
In the following we aimed at taking advantage of two property of synthetic responses: they give rise to a form (i.e. number form, see Galton) that follows a sequential order (or ordinality ).
We harnessed a geographical package [ADD REF] to extract geometrical features from participant responses. For example we can extract polygons from each conditions and compute the area of these polygons.
In the following I upload and merge the data from Ward, Rothen and
Van Peters. Data is stored into a full dataset ds (i.e. 1
row per trial) and a dataset per participant ds_Quest
(i.e. 1 row per participant).
## New names:
## New names:
## • `` -> `...36`
## • `` -> `...37`
## [1] 0
## Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
## ℹ Please use one dimensional logical vectors instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We exclude 2 participants for which we could not compute the z scores, hence invalid responses.
Manually ajust come inconsistent screen sizes:
## [1] NA
We initialize an empty dataframe to collect ROC specifications of each features:
Definition: Calculating consistency Each stimulus is
represented by three xy coordinates - (x1, y1), (x2, y2), (x3, y3) -
from the three repetitions. For each stimulus, the area of the triangle
bounded by the coordinates is calculated as follows:
\(Area = (x1y2 + x2y3 + x3y1 – x1y3 – x2y1 –
x3y2) / 2\)
The mean area is calculated by adding together the area for each
stimulus and dividing by 29. This unit is transformed into a percentage
area taking into account the different pixel resolution of each
participant.
Mean area = \((Summed area / 29) * 100 /
ScreenArea\), where: \(ScreenArea =
Xpixels * Ypixels\)
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 0.0765602 | 76.43098 | 65.31365 | 70.71651 | 71.65992 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 94 (34.7%) | 177 (65.3%) |
| Syn | 227 (76.4%) | 70 (23.6%) |
Replicate Rothen methods. Might take some time to compute.
“Calculating chance levels of consistency To create permuted datasets for each participant: the 87 xy coordinates are randomly shuffled so they are no longer linked to the original data labels (“Monday”, “5”, “April”, etc.). The mean area of the triangles based on the shuffled coordinates is computed (as described above), and the whole process is repeated 1000 times to obtain a subject-specific distribution of chance levels of consistency. A z-score is calculated comparing the observed consistency against the mean and SD of the permuted data: \(Z = [(observed consistency) – (mean consistency of permuted data)] / (SD of permuted data)\)”
Code retrieved from OSF (adapted here):
As in Ward:
“Specifically, the standard deviation of the x-coordinates and/or the standard deviation of the y-coordinates (measured across all trials) should exceed a proposed value of 0.075 for a normalized screen with width and height of 1 unit.”
“A participant who produced a horizontal straight-line form would have a very low standard deviation in the y-coordinates but a high standard deviation in x-coordinates, and a participant with a vertical line would have the reverse profile. A participant with a circular spatial form would be high on both. A participant who clicks randomly around the screen would also be high on both x and y standard deviation, but would fail the consistency tests (the triangles would be large).”
Hence the SD is used in combination with consistency.
Would need an example with all in the center
WORK ON THIS
I need to review the paper to try to replicate the ROC (hence with
Ward’s dataset).
Then generalize it to full data set.
Also need to figure out how to integrate the three different
criterias.
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 147.8816 | 92.25589 | 41.69742 | 63.42593 | 83.08824 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 113 (41.7%) | 158 (58.3%) |
| Syn | 23 (7.7%) | 274 (92.3%) |
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 91.6897 | 81.81818 | 59.7786 | 69.03409 | 75 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 162 (59.8%) | 109 (40.2%) |
| Syn | 54 (18.2%) | 243 (81.8%) |
These new measures aim to take advantage of several properties: - ordinality - synesthetic forms Hence we aim to take advantage of some geometrical features of the synesthetic forms. For example we can define segments across the ordered stimuli (i.e. from 1 to 9, monday to sunday and january to december).
An idea I have is to look into the lines and order of the forms. I would exclude when lines crosses. (since we expect forms the lines crossing means no form is formed). Needs refinement.
I think that the number of stimuli per condition should be taken into account (i.e. 9 numbers, 7 days, 12 months). Hence would need to be divided by this number of stimulus.
In each condition the connected x and y generates a segment, hence
the number of segment is length(stimuli)-1. Moreover,
currently, each stimuli is connected by 3 segment, one for each (of the
3) repetition. So dividing by 3, we have the average number of segment
corssings per condition. Next we sum these for each ID Ideally we should
compute the number of crossings across the repetitions, in addition to
make it more complex it would also be computationally more demanding,
and I don’t beleive it would lead to a significant difference.
IMPORTANT: data frame needs to be informed of stimulus order to make sense!
The question is, should I sum the features across Conditions or average them? I know some conditions contain responses at the exact same coordinates. Also the conditions don’t have the same number of stimuli, i.e.: - weeks: 7 - months: 12 - numbers: 10 Hence months are more likely to have self-intersections than weeks. But also some participant did not respond on specific conditions. How is that important?
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 1.270115 | 76.76768 | 62.36162 | 69.09091 | 71.0084 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 102 (37.6%) | 169 (62.4%) |
| Syn | 228 (76.8%) | 69 (23.2%) |
Analyzing each repetition separately might favour horizontal positioning based on LTR order. For example, using the strategy if the number 0 is always positioned in the left, and 9 on the right (see MNL), there might be no intersections, though no Synesthesia. However it is more unlikely that this would work across repetitions (i.e. having the same vertical position). So I need to add a criteria of the number of intersections across repetitions. This would however only work if I exclude the end to 1st between each repetition.
With 3 repetitions we have: - 1 vs 2 - 2 vs 3 - 3 vs 1
We will take advantage of the sf package.
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
## Spherical geometry (s2) switched off
## X Y L1
## [1,] -0.33939732 0.03257203 1
## [2,] -0.06443808 -1.45129143 1
## [3,] 0.80626616 -0.54448598 1
## [4,] 1.05067436 -0.88453802 1
## [5,] 0.95902129 -1.47190064 1
## [6,] 0.60768449 1.05272817 1
## [7,] -0.27829526 1.00120513 1
## [8,] -1.39340771 0.64054387 1
## [9,] -2.20300990 0.54780240 1
## [10,] -1.92805067 -0.20443393 1
## [1] 30
## [1] 63
## [1] 168
TO ADD
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 7.856126 | 83.83838 | 49.07749 | 64.34109 | 73.48066 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 138 (50.9%) | 133 (49.1%) |
| Syn | 249 (83.8%) | 48 (16.2%) |
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv |
|---|---|---|---|---|
| -Inf | 100 | 0 | 52.28873 | NaN |
| Inf | 0 | 100 | NaN | 47.71127 |
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 3 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 4 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 5 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 6 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 7 has 2 rows to replace 1 rows
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 1.288086 | 59.25926 | 70.4797 | 68.75 | 61.21795 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 191 (70.5%) | 80 (29.5%) |
| Syn | 121 (40.7%) | 176 (59.3%) |
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 0.1666667 | 74.41077 | 56.45756 | 65.19174 | 66.81223 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 153 (56.5%) | 118 (43.5%) |
| Syn | 76 (25.6%) | 221 (74.4%) |
is topologically valid:
From the package description: “For projected geometries, st_make_valid uses the lwgeom_makevalid method also used by the PostGIS command ST_makevalid if the GEOS version linked to is smaller than 3.8.0, and otherwise the version shipped in GEOS; for geometries having ellipsoidal coordinates s2::s2_rebuild is being used.” From https://postgis.net/docs/ST_IsValid.html: value is well-formed and valid in 2D according to the OGC rules. (Open Geopsatial Consotrtium)
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 1.5 | 70.70707 | 75.27675 | 75.81227 | 70.10309 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 204 (75.3%) | 67 (24.7%) |
| Syn | 87 (29.3%) | 210 (70.7%) |
See: https://r-spatial.org/book/03-Geometries.html#sec-opgeom See: https://en.wikipedia.org/wiki/DE-9IM
DE-9IM is a standard for several topological model’s features. It is called by st_relate. It returns a 3 x 3 matrix (DE9IM) for each relations:
\({\displaystyle \operatorname {DE9IM} (a,b)={\begin{bmatrix}\dim(I(a)\cap I(b))&\dim(I(a)\cap B(b))&\dim(I(a)\cap E(b))\\\dim(B(a)\cap I(b))&\dim(B(a)\cap B(b))&\dim(B(a)\cap E(b))\\\dim(E(a)\cap I(b))&\dim(E(a)\cap B(b))&\dim(E(a)\cap E(b))\end{bmatrix}}}\)
dim$ {}$ is the dimension of the intersection (∩) of the interior (I), boundary (B), and exterior (E) of geometries a and b.
Hence it returns a spatial predicate wdefined with mas domains:
## Ctl Syn Subs
## 212101212 952 1568 0.6071429
## 2121012F2 2 2 1.0000000
## 21210F212 0 1 0.0000000
## 212111212 4 8 0.5000000
## 212F01212 18 8 2.2500000
## 212F01FF2 15 2 7.5000000
## 212F0F212 12 2 6.0000000
## 212F0FFF2 5 1 5.0000000
## 212F1FFF2 1 0 Inf
## 2F2101212 13 1 13.0000000
## 2F21012F2 10 4 2.5000000
## 2F2F01212 16 2 8.0000000
## 2F2F01FF2 41 4 10.2500000
## 2FF10F212 12 1 12.0000000
## 2FF10F2F2 3 0 Inf
## 2FF11F212 1 0 Inf
## 2FFF0F212 32 3 10.6666667
## 2FFF0FFF2 138 19 7.2631579
## 2FFF1FFF2 626 853 0.7338804
## FF2F01212 6 0 Inf
## FF2F01FF2 27 6 4.5000000
## FF2F11212 1 0 Inf
## FF2FF1212 294 124 2.3709677
## FFFF0F212 26 2 13.0000000
## FFFF0FFF2 184 62 2.9677419
## Ctl Syn Subs
## 212101212 1009 1629 0.6193984
## 2121012F2 0 1 0.0000000
## 21210F212 2 3 0.6666667
## 21210F2F2 1 0 Inf
## 212111212 12 9 1.3333333
## 212F01212 12 2 6.0000000
## 212F01FF2 11 0 Inf
## 212F0F212 12 2 6.0000000
## 212F0FFF2 5 1 5.0000000
## 212F11212 1 0 Inf
## 212F11FF2 1 0 Inf
## 2F2101212 13 7 1.8571429
## 2F21012F2 27 6 4.5000000
## 2F2111212 1 0 Inf
## 2F2F01212 20 1 20.0000000
## 2F2F01FF2 44 10 4.4000000
## 2FF10F212 21 2 10.5000000
## 2FF10F2F2 8 1 8.0000000
## 2FF11F2F2 1 0 Inf
## 2FFF0F212 59 6 9.8333333
## 2FFF0FFF2 173 31 5.5806452
## 2FFF1FFF2 591 846 0.6985816
## FF2F01212 4 2 2.0000000
## FF2F01FF2 40 4 10.0000000
## FF2F11212 3 0 Inf
## FF2FF1212 99 39 2.5384615
## FFFF0F212 31 8 3.8750000
## FFFF0FFF2 238 63 3.7777778
## Ctl Syn Subs
## 212101212 939 1569 0.5984704
## 2121012F2 0 2 0.0000000
## 21210F212 0 1 0.0000000
## 21210F2F2 1 0 Inf
## 212111212 8 7 1.1428571
## 212F01212 12 3 4.0000000
## 212F01FF2 14 1 14.0000000
## 212F0F212 23 7 3.2857143
## 212F0FFF2 5 0 Inf
## 212F11212 1 0 Inf
## 2F2101212 18 5 3.6000000
## 2F21012F2 11 1 11.0000000
## 2F2111212 1 0 Inf
## 2F2F01212 12 1 12.0000000
## 2F2F01FF2 44 3 14.6666667
## 2FF10F212 7 0 Inf
## 2FF10F2F2 4 1 4.0000000
## 2FFF0F212 39 8 4.8750000
## 2FFF0FFF2 147 27 5.4444444
## 2FFF1FFF2 611 853 0.7162954
## FF2F01212 6 2 3.0000000
## FF2F01FF2 30 4 7.5000000
## FF2F11212 2 0 Inf
## FF2FF1212 279 125 2.2320000
## FFFF0F212 40 4 10.0000000
## FFFF0FFF2 185 49 3.7755102
## Ctl Syn Subs
## 2121012122 939 1569 0.5984704
## 212101212 952 1568 0.6071429
## 2121012121 1009 1629 0.6193984
## 2FFF1FFF21 591 846 0.6985816
## 2FFF1FFF22 611 853 0.7162954
## 2FFF1FFF2 626 853 0.7338804
## FF2FF12122 279 125 2.2320000
## FF2FF1212 294 124 2.3709677
2FFF1FFF2: S1 Interior vs. S2 Interior: The interiors intersect in 2 dimensions (2). S1 Interior vs. S2 Boundary: No intersection (F). S1 Interior vs. S2 Exterior: No intersection (F). S1 Boundary vs. S2 Interior: No intersection (F). S1 Boundary vs. S2 Boundary: A 1-dimensional intersection occurs (e.g., they share a common line segment) (1). S1 Boundary vs. S2 Exterior: No intersection (F). S1 Exterior vs. S2 Interior: No intersection (F). S1 Exterior vs. S2 Boundary: No intersection (F). S1 Exterior vs. S2 Exterior: The exteriors intersect in 2 dimensions (2).
2FFF0FFF2: 2: The intersection of the first geometry’s interior and the second geometry’s interior creates a polygon (a two-dimensional intersection). F: The interior of the first geometry does not intersect the boundary of the second. F: The interior of the first geometry does not intersect the exterior of the second. F: The boundary of the first geometry does not intersect the interior of the second. 0: The boundary of the first geometry intersects the boundary of the second geometry at a point (a zero-dimensional intersection). F: The boundary of the first geometry does not intersect the exterior of the second. F: The exterior of the first geometry does not intersect the interior of the second. F: The exterior of the first geometry does not intersect the boundary of the second. 2: The exterior of the first geometry intersects the exterior of the second geometry, creating a polygon (a two-dimensional intersection).
FFFF0FFF2: F (False): The intersection of the interior of the first geometry with the interior of the second geometry is empty. F (False): The intersection of the interior of the first geometry with the boundary of the second geometry is empty. F (False): The intersection of the interior of the first geometry with the exterior of the second geometry is empty. F (False): The intersection of the boundary of the first geometry with the interior of the second geometry is empty. 0 (Zero-Dimensional): The intersection of the boundary of the first geometry with the boundary of the second geometry is a point (0-dimensional). F (False): The intersection of the boundary of the first geometry with the exterior of the second geometry is empty. F (False): The intersection of the exterior of the first geometry with the interior of the second geometry is empty. F (False): The intersection of the exterior of the first geometry with the boundary of the second geometry is empty. 2 (Two-Dimensional): The intersection of the exterior of the first geometry with the exterior of the second geometry is a 2-dimensional area.
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 17078.5 | 93.60269 | 23.24723 | 57.20165 | 76.82927 |
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 2.5 | 75.75758 | 41.32841 | 58.59375 | 60.86957 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 112 (41.3%) | 159 (58.7%) |
| Syn | 72 (24.2%) | 225 (75.8%) |
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
| threshold | sensitivity | specificity | ppv | npv | |
|---|---|---|---|---|---|
| threshold | 1.491143 | 74.74747 | 47.97048 | 61.15702 | 63.41463 |
| Ctl | Syn | |
|---|---|---|
| Ctl | 130 (48%) | 141 (52%) |
| Syn | 75 (25.3%) | 222 (74.7%) |
## Warning in st_cast.sf(ds_segm, "POINT"): repeating attributes for all
## sub-geometries for which they may not be constant
## Warning: Removed 16104 rows containing non-finite outside the scale range
## (`stat_density()`).
## [1] 444
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
##
## Call:
## roc.formula(formula = group ~ quadrant, data = ds[!is.nan(ds$quadrant), ], percent = TRUE, ci = TRUE, boot.n = 100, ci.alpha = 0.9, stratified = FALSE, plot = TRUE, auc.polygon = TRUE, max.auc.polygon = TRUE, grid = TRUE, print.auc = TRUE, show.thres = TRUE)
##
## Data: quadrant in 15621 controls (group Ctl) < 21563 cases (group Syn).
## Area under the curve: 52.87%
## 95% CI: 52.31%-53.44% (DeLong)
| Feature | AUC | threshold | sensitivity | specificity | ppv | npv | high_ci | low_ci | |
|---|---|---|---|---|---|---|---|---|---|
| 10 | isValidPoly | 77.7473 | 1.500000e+00 | 70.70707 | 75.27675 | 75.81227 | 70.10309 | 74.01116 | 77.74734 |
| 4 | LineInter | 72.8354 | 1.270115e+00 | 76.76768 | 62.36162 | 69.09091 | 71.00840 | 68.60864 | 72.83536 |
| 1 | Consistency_zs | 70.3685 | 7.656020e-02 | 76.43098 | 65.31365 | 70.71651 | 71.65992 | 65.81407 | 70.36851 |
| 8 | areaPoly | 69.8958 | 1.288086e+00 | 59.25926 | 70.47970 | 68.75000 | 61.21795 | 65.64189 | 69.89576 |
| 3 | SD_ID_y | 69.6088 | 9.168970e+01 | 81.81818 | 59.77860 | 69.03409 | 75.00000 | 65.10650 | 69.60876 |
| 9 | isSimplePoly | 69.3572 | 1.666667e-01 | 74.41077 | 56.45756 | 65.19174 | 66.81223 | 65.06818 | 69.35716 |
| 2 | SD_ID_x | 64.3321 | 1.478816e+02 | 92.25589 | 41.69742 | 63.42593 | 83.08824 | 59.65966 | 64.33213 |
| 6 | Segm_leng | 63.7755 | 7.856126e+00 | 83.83838 | 49.07749 | 64.34109 | 73.48066 | 59.07019 | 63.77552 |
| 12 | isClockwise | 61.4279 | 2.500000e+00 | 75.75758 | 41.32841 | 58.59375 | 60.86957 | 56.87297 | 61.42793 |
| 13 | areaVhull | 56.0389 | 1.491143e+00 | 74.74747 | 47.97048 | 61.15702 | 63.41463 | 51.11374 | 56.03886 |
| 11 | relateReciepe | 55.6077 | 1.707850e+04 | 93.60269 | 23.24723 | 57.20165 | 76.82927 | 51.33587 | 55.60774 |
| 7 | BtwDist | 41.89 | -Inf | 100.00000 | 0.00000 | 52.28873 | NaN | 37.66036 | 41.88999 |
| 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases
##
## Bootstrap test for two correlated ROC curves
##
## data: Sum_isValidStruct and Consistency_zs in ds_Q by group (Ctl, Syn)
## D = 5.9995, boot.n = 100, boot.stratified = 1, p-value = 1.979e-09
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## pAUC (100-90 specificity) of roc1 pAUC (100-90 specificity) of roc2
## 3.05545340 0.06696734
Following https://www.tidymodels.org/start/recipes/
## ── Attaching packages ────────────────────────────────────── tidymodels 1.4.1 ──
## ✔ broom 1.0.10 ✔ rsample 1.3.1
## ✔ dials 1.4.2 ✔ tailor 0.1.0
## ✔ infer 1.0.9 ✔ tune 2.0.0
## ✔ modeldata 1.5.1 ✔ workflows 1.3.0
## ✔ parsnip 1.3.3 ✔ workflowsets 1.1.1
## ✔ purrr 1.1.0 ✔ yardstick 1.3.2
## ✔ recipes 1.3.1
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ infer::conf_int() masks papaja::conf_int()
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step() masks stats::step()
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loaded glmnet 4.1-10
##
## Attaching package: 'vip'
## The following object is masked from 'package:utils':
##
## vi
## # A tibble: 75 × 8
## penalty mixture .metric .estimator mean n std_err .config
## <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 0.0000000001 0 accuracy binary 0.725 5 0.0285 pre0_mod01_p…
## 2 0.0000000001 0 brier_class binary 0.178 5 0.00880 pre0_mod01_p…
## 3 0.0000000001 0 roc_auc binary 0.809 5 0.0189 pre0_mod01_p…
## 4 0.0000000001 0.25 accuracy binary 0.732 5 0.0320 pre0_mod02_p…
## 5 0.0000000001 0.25 brier_class binary 0.181 5 0.0103 pre0_mod02_p…
## 6 0.0000000001 0.25 roc_auc binary 0.803 5 0.0209 pre0_mod02_p…
## 7 0.0000000001 0.5 accuracy binary 0.732 5 0.0320 pre0_mod03_p…
## 8 0.0000000001 0.5 brier_class binary 0.181 5 0.0104 pre0_mod03_p…
## 9 0.0000000001 0.5 roc_auc binary 0.803 5 0.0209 pre0_mod03_p…
## 10 0.0000000001 0.75 accuracy binary 0.736 5 0.0312 pre0_mod04_p…
## # ℹ 65 more rows
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 3 × 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.736 pre0_mod0_post0
## 2 roc_auc binary 0.813 pre0_mod0_post0
## 3 brier_class binary 0.179 pre0_mod0_post0
## ══ Workflow [trained] ══════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: logistic_reg()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 1 Recipe Step
##
## • step_normalize()
##
## ── Model ───────────────────────────────────────────────────────────────────────
##
## Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = "binomial", alpha = ~0.75)
##
## Df %Dev Lambda
## 1 0 0.00 0.305200
## 2 1 2.07 0.278100
## 3 1 3.92 0.253400
## 4 1 5.56 0.230800
## 5 2 7.28 0.210300
## 6 2 9.00 0.191700
## 7 2 10.52 0.174600
## 8 3 11.94 0.159100
## 9 3 13.27 0.145000
## 10 3 14.44 0.132100
## 11 3 15.45 0.120400
## 12 3 16.33 0.109700
## 13 4 17.22 0.099930
## 14 4 18.04 0.091050
## 15 4 18.75 0.082960
## 16 4 19.37 0.075590
## 17 4 19.90 0.068880
## 18 5 20.41 0.062760
## 19 5 20.96 0.057180
## 20 6 21.46 0.052100
## 21 8 21.91 0.047470
## 22 8 22.32 0.043260
## 23 8 22.68 0.039410
## 24 8 22.98 0.035910
## 25 8 23.25 0.032720
## 26 9 23.48 0.029820
## 27 9 23.70 0.027170
## 28 10 23.89 0.024750
## 29 10 24.07 0.022550
## 30 10 24.22 0.020550
## 31 11 24.35 0.018720
## 32 11 24.54 0.017060
## 33 11 24.72 0.015550
## 34 11 24.88 0.014160
## 35 11 25.02 0.012910
## 36 11 25.14 0.011760
## 37 11 25.25 0.010720
## 38 11 25.34 0.009763
## 39 11 25.42 0.008896
## 40 11 25.49 0.008106
## 41 11 25.55 0.007385
## 42 11 25.60 0.006729
## 43 11 25.65 0.006132
## 44 11 25.69 0.005587
## 45 11 25.72 0.005091
## 46 11 25.75 0.004638
##
## ...
## and 22 more lines.
## # A tibble: 0 × 5
## # ℹ 5 variables: term <chr>, step <dbl>, estimate <dbl>, lambda <dbl>,
## # dev.ratio <dbl>
From the different features we extracted, topological validity across the repetitions appeared to be the one leading to the largest Area Under the Curve. The optimal cutoff was exactly 1.5, leading to a sensitivity () and specificity ().
The optimal criterion ineeds to be informed about the order between inducers (i.e. to construct the polygons) and interestingly suggests that synthetic inducer are structurally mapped following topological rules analogous to geographical space structures. Hence suggesting a spatial nature for the synthetic forms of space sequence synesthetes.